Phonetic Variation Analysis Via Multi-Factor Sparse Plus Low Rank Language Model
نویسندگان
چکیده
Phonetic transcriptions contain rich information about language. First, the sequential patterns in phonetic transcripts reveal information about the language’s phonotactics. When combined with lexical information, this can help to grow or correct pronunciation dictionaries and to improve grapheme-to-phoneme prediction. Second, the places where pronunciations deviate from the norm can be equally informative; for example, by providing cues for speaker traits such as accent, dialect or sociolect. Interesting in itself, detecting speaker characteristics can also be used to improve speech recognition system performance (Biadsy, 2011). In this extended abstract we describe on-going work to automatically analyze both the regularities and the exceptions (deviations) in phonetic sequences. We use the Multi-Factor Sparse Plus Low Rank Language Model (Hutchinson et al., 2013), which was shown to effectively model regularities and exceptions in word sequences (e.g. by identifying lexical deviations characteristic of topic or speaker role). Preliminary results modeling commonalities and variation between dialects of American English are promising and suggest several extensions to this work.
منابع مشابه
Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions1 by Alekh Agarwal2, Sahand Negahban3 And
We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix with a second matrix endowed with a complementary form of low-dimensional structure; this set-up includes many statistical models of interest, including factor...
متن کاملNoisy matrix decomposition via convex relaxation: Optimal rates in high dimensions
We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix Θ⋆ with a second matrix Γ⋆ endowed with a complementary form of low-dimensional structure; this set-up includes many statistical models of interest, including ...
متن کاملA Sparse Plus Low Rank Maximum Entropy Language Model
This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. multiword expressions). The low rank component corresponds to a continuous-space language model. This model generalizes the standard `1regularized maximum entropy model, and has ...
متن کاملمقایسه روش های طیفی برای شناسایی زبان گفتاری
Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...
متن کاملIdentifying Broad and Narrow Financial Risk Factors with Convex Optimization
Factor analysis of security returns aims to decompose a return covariance matrix into systematic and specific risk components. To date, most commercially successful factor analysis has been based on fundamental models, although there is a large academic literature on statistical models. While successful in many respects, traditional statistical approaches like principal component analysis and m...
متن کامل